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1. INTRODUCTION 


In order to assess the performance of the Large Area Crop Inventory Experiment 
(LACIE) system, several years of LACIE results are required. Two types of 
problems are thus presented: (1) It will be several years before these data 

are available; and (2) the LACIE system is evolving from year to year, so 
the results obtained over several years are actually representative of several 
different "LACIE systems." 

The LACIE Performance Predictor (LPP) is a set of computer programs which 
simulate the performance of a given LACIE system (i.e., the system used in a 
given year or phase of LACIE). The LPP can be used to evaluate the system by 
simulating the input and thereby simulating the results that would be obtained 
in several years of operation of that system, and can also be used to study 
the effect of various error sources on the final LACIE estimates. 

This study describes several runs that were made with the LPP, each of which 
simulated 15 years of LACIE Phase II operations. The runs correspond to dif- 
ferent sets of assumptions about the basic error sources in the LACIE system. 
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2. THE LAC IE PERFORMANCE PREDICTOR 


The LPP simulates these major elements of the LACIE system: 

Segment acquisition 

Estimation of wheat proportion within the segment 

Yield estimation 

Area and production aggregation 

The procedures used to perform these simulation tasks are described in follow- 
ing sections. 

2 . 1 SEGME NT ACQUISITION 

The first major task the LPP performs is to simulate the acquisition of sample 
segment data by the Landsat. The segments are located at the positions deter- 
mined by the LACIE Phase II allocation. The LPP calculates the orbit of the 
Landsat and prepares a file which contains the dates on which data was acquired 
by Landsat for each segment. 

Subsequently, an allowance is rade for cloud cover as acquisitions with cloud 
cover above a given threshold are not used by the LACIE system. Historical 
cloud cover data from weather observations is used to simulate the cloud cover 
on each acquisition of each segment. This is done by randomly choosing a 
cloud cover in such a way that the probability of a given cloud cover per- 
centage being selected is equal to the frequency it was observed in the past. 

If the simulated cloud cover is greater than the threshold value, the acquisi- 
tion is rejected. 

2 . 2 SIMULATI ON OF COUNT Y PR O PORTIO NS 

It is assumed that the county proportion P. for the i th county is distributed 
according to a beta distribution; i.e., 

P i ~ 
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where is the mean and is the standard deviation of the distribution. 

The means are taken to be the 1975 proportions as determined by the Statis- 
tical Reporting Service (SRS) of the U.S. Department of Agriculture (USDA). 
For the i th county, the proportion is denoted by P 75 ^ . The standard devia- 
tions are calculated for each county by taking the standard deviation of the 
historical proportions for that county for the years 1965 through 1974. How- 
ever, the LPP does not accept P ?5 i and r,. directly, but instead requires the 
following inputs: 

CV l.t • V^H.t 

spw, - („, - r Ht1 ^ Htl 

where 

P H j = average proportion for the \th county for the y'ars 1965 through 1974 


Both CVj . and iSPW^ are manually calculated from ^ , and P^^ ^ and used 

as inpu.i to the LPP which calculates and for each county. A simulated 
"true" county proportion P^ is then calculated by the LPP for each county i 
by choosing a random number generated for the distribution B(u^, 


2.3 SIMULATION 0 F "TRUE" PROPORTIONS FOR SEGMENTS 


For segments in county i, it is assumed that the segment proportions are 

distributed according to a beta distribution B (P , , 0.,), where P. is the "true" 

county proportion described above and 0 \ is the wi thin-county variance of 

1 2 

segment wheat proportions. In order to determine , previous studies are 
used which provide an estimate of the wi thin-county variance of small-grain 


It is assumed that this estimate is equal to 0^. The studies 


proportions. 

are based on LACIE analysts' interpretation of Landsat Imagery of all the 
counties in the U.S. Great Plains (USGP). Each county was partitioned into 
segments, and estimates were made of the total agriculture (ag) proportion 

of each segment in each county. These estimates were used to calculate the 

2 

averaae ag proportion X,„ and the ag variance • for all of the counties 

ag ag % i 

in the USGP. 
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Another task consisted of doing the same type of analysis to produce estimates 

- 2 

of average small-grain proportion X . and small-grain variance 0,_ . . How- 

59 , * sg 1 1 

eve-, the results for small grains are limited to a subset of approximately 

— 2 — 

45 counties. A simple regression model based upon X . , 0 . , X . , and 

ag • • «9 » ■ sg , i 

2 2 

C> < the subset is used to obtain values of . for all of the counties. 
sg,1 sg,i 

As stated previously, it is assumed that 0 . = 4 . 

1 sg , 1 

? 

The LPP does not accept as input the variance 0^ itself but the coefficient 
of variation (CV) which is given by 


C V 
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cyp 


i 


These could be calculated if the P ■ were known, but unfortunately the P^, 
which are calculated by the LPP, are not available in advance to compute 
CV^, ^ . Therefore, the following procedure is used to determine the CV^ ^ 
First, the following quantities are calculated: 


P = 



7 




i = l 


CV 


2 



where n is the number of counties in a given state. The value obtained for 
CV^ is then input to the LPP for each of the nine states. For the counties 
in each state, the LPP calculates the quantities: 

O’ = (CV 2 )P. 

These are taken as the estimates of the within-county variance of wheat pro- 
portion to be used in the model. 


,1 
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A "true" wheat proportion X.^ is ther simulated for each segment j in the 1 th 
county by choosing a random number generated for the distribution 0(Pp Oj). 


2.4 SIMULATION OF CLASSIF ICATION AND MENSUR ATION S U BSYSTEM (C A MS) ESTIMA TE 


It is assumed that the CAMS estimates of wheat proportions for the j r >» 
segment in the 1 th ounty are distributed according to the beta distribution 

; tj * B ij’ °1j ) ’ where B 1J 


S(X^ ♦ B^, o^), where B 4< is the CAMS bias for the j th segment in the 1 th 


county and is the variance in the CAMS errors (i.e., in X^ 


- X,j) for 


these segments. The values B.^ arid were estimated using blind site data 
and CAMS estimates as follows: 


and 


where 




) 



J 



1 = 1 j = l 



where 

Ng is the number of blind sites in the state 
N is the number of segments in the state 
n is the number of counties in the state 
n^ is the number of segments in the i th county 
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X k is the measured ground-truth wheat proportion (the "true" wheat proportion) 
is the CAMS estimate of the wheat proportion for the kth blind site In 
the state. 

The quantities B/X and o/X are input to the LPP, which performs a multiplica- 

A 

tion by X^ to obtain and o^. Subsequently, it generates the X^ by 
choosing a random number generated for the distribution 0(X^ ♦ B^, o^). 

2 

Different values of B and c are computed for each of the four biowindows, 

a 

and the appropriate values are used to simulate a value of X^ corresponding 

to an acquisition in a given biowindow. In principle, one could also calcu- 

2 

late values of B and o corresponding to various combinations of biowindows. 
However, this was not done for the runs described in this paper as not enough 
blind sites had the required combinations of acquisitions. All of the estl- 

a 

mates of X^ described here correspond to a sing'e acquisition. 

2.5 SIMULAT ION O F YIELD ESTIMATES 

Yield estimates are simulated by the LPP for each Crop Reporting District 
(CRD) in the USGP. The "true" yield Y- for the 1 th CRD is taken to be the 
1975 yield estimates by the USDA/SRS. The final yield estimates corresponding 
to biowindow 4 for the Uh CRD, Y^ are assumed to be distributed normally; 
i .e. , 

- n(Y it <j> 41 ) 

where the standard deviation is determined from the results of the 10-year 
test made by the Center for Climatic and Environmental Assessment (CCEA) of 
the yield model used in LACIE. 

Standard deviations of yield estimates for earlier biowindows have to reflect 
the increasingly unreliable nature of CCEA yield estimates n.ade at earlier 
dates in the growing season. To do this, each standard deviation input for 
a CRD for a particular biowindow is assumed to be 4 percent larger than the 


U> 
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standard deviation Input for the biowindow that followed In the season. Work- 
ing backwards from harvest, 

♦j.i ■ 

* 2.1 ' '• 04 * 3,1 

* 1.1 ' ’- 04 * 2.1 

2 . 6 S I MU L AT I ON OF THE LAC IE AGGREGATION PROCEDURE 

The LPP simulates the LACIE aggregation procedure to produce estimates for 
each year of the harvested wheat area, the wheat yield, and the wheat pro- 
duction by CRD, state, region, and country. Estimates of the CV's (standard 
deviation divided by the "true" value) are also produced at these levels in 
a fashion identical to the way they are produced by the actual LACIE aggrega- 
tion procedures. 

Any LPP aggregation corresponds to a particular date, and the CAMS estimate 

A 

is based on the latest acquisition prior to that date. Also, the time 
taken by the actual LACIE system to process an acquisition to the point where 
it is ready for aggregation is not considered in the LPP. 

Two kinds of aggregations are performed corresponding to the kinds of error 
included in the aggregation estimates. They are: 

Sampling error only , performed by aggregating the simulated "true" segment 
proportions with yield set equal to Y^. 

Sampling, classification, and yield errors , performed by aggregating the CAMS 
estimates X^ to the CRD and multiplying by the yield estimates, Y^ , for the 
CRD to obtain a production estimate for the CRD. The acreage and production 
estimates are then summed to obtain estimates for higher levels. 
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3. DESCRIPTION OF RUNS 


In the evaluations described here, the outer loop shown in figure 3-1 Is run 
four times, once corresponding to no clouds and three times using the regular 
cloud cover data, thereby producing four different sets of acquisition dates. 
By design, they each produce the same set of values for the "true" county 
proportion P. . This is achieved by using the sane random number seed for the 
generation of the in all four runs. Each run of the outer loop produces a 
data tape containing the results of that run which is an Input to the inner 
loop (fig. 3-1) which is a separate set of programs. In all, two separate 
runs can be made with the inner loop for each of the four runs made with the 
outer loop, as listed In table 3-1. Each of the eight runs could be made to 
simulate any desired number of "years" of the system. 

The runs corresponding to SO, SI, S2, and S3 In table 3-1 were each made to 
simulate 15 separate "years" of LAC I E operations. After each area estimate 
A. for the nine-state USGP region was calculated, the CV for that and all the 
previous years was calculated, as shown in figure 3-2. It appeared that at 
15 years CV's had converged sufficiently well to a constant value to stop 
the processing. 


TABLE 3-1.- RUNS MADE WITH THE LPP 


Outer-loop 

run 

Inner-loop runs 

Sampling error only 

Sampling classification 
and yield errors 

1 

SO 

SCYO 

2 

SI 

SCY1 

3 

S2 

SCY2 

4 

S3 

SCY3 
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Figure 3-1.- LPP data flow. 















4. RESULTS 


4.1 SEGMENT ACQUISITION 

i he results of these runs were used to make a study of the acquisition simula- 
tion part of the LPP. The fraction of the sample segments having at least one 
acquisition as determined by the LPP was plotted as a function of time and 
compared with the number actually obtained in LACIE. The results are shown 
in figure 4-1. The curve labeled A is the LPP results in the case where zero 
cloud cover was assumed; i.e., the cloud cover simulator was programmed to 
always produce a cloud cover of zero. By December 1, all of the winter wheat 
segments had been acquired and the curve is flat until April 1, when the 
acquisition of spri.'.a wheat sites begins. All sites had been acquired at 
least once by July 1 . 

Ti»e three curves labeled B correspond to simulations of three different "years" 
of LACIE operations where the only factor which varied was the cloud cover 
(i.e., SI, S2, and S3). A threshold of 50 percent was used and was chosen to 
obtain approximately the same total number of acquisitions over the year as 
was obtained in LACIE Phase IlJ The three curves are quite close together, 
indicating rnly a small effect of cloud cover on acquisition history. This 
is probably due to the fact that in the LPP It is assumed that cloud cover 
at each segment is independent of the cloud cover at all other segments, 
whereas in fact there is probably a high degree of correlation between the 
amounts of cloud cover over segments that are reasonably close together. 

Curve C is the actual LACIE Phase . .cquisition history. It is lower than 
the curves produced by the LPP for all dates, partly because the cloud cover 
threshold of 50 percent was too high. Also, the discrepancy is quite large 
early in the year. The reason for this is not known. 


’Actually the threshold was too high, and the three curves labeled B corre- 
spond to about 15 percent more acquisitions than the 2249 acquisitions in 
LACIE Phase II. 
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4.2 "TRUE" PROPORTIONS FOR SEGMENTS 


The "true" wheat proportions for the blind site segments generated by the LPP 
in run number SI are compared with the actual blind site proportions in fig- 
ure 4-2. The LPP produced more segments with 0 to 4 percent wheat and more 
segments with a high proportion of wheat (greater than 55 percent). A 
Kolmogorov-Smirnov test was performed and showed that there was no significant 
difference between the two distributions. 

4 . 3 CAMS PROPORT ION ESTIM A TES 

Originally it was planned to also make all the runs corresponding to column 2 
in table 3-1. However, as each run took about 3 hours of conputer time, it 
was decided to drop the unrealistic case SCYO; as there was little difference 
between the results of the runs of SI, S2, and S3, it was decided to drop 
SCY2 and SCY3. Thus, the only run made which included more than sampling 
error alone was SCY1, which used the same "true" county proportions as SI. 

A 

Figure 4-3 shows a histogram of the LACIE errors (X R - X^) for all of the 
blind sites in the USGP region. Figure 4-4 shows a histogram of the errors 

A 

simulated by the LPP in the run SCY1; i.e., X^ - X^ for all blind sites in 
the USGP. These histograms should be similar if the LPP is correctly simulat- 
ing the results of the CAMS classification procedures. 

Leptokurtosi s (peakedness in the center of distribution) is evident in the 
LPP results. A comparison of the two distributions (figs. 4-3 and 4-4) 
using Kolmogorov-Smirnov statistics showed that the distributions were sig- 
nificantly different. 

4 . 4 ACREAG E AND PRO DUCTION ESTIMATES FOR THE USGP 

The 15 different acreage and production estimates by the LPP for the USGP final 
prediction date (September 1) of the SCY1 run are plotted in figure 4-5. 

The abscissa and ordinate are respectively the relative differences of the 
production and acreage estimates relative to the "true" values used by the 
LPP for the 15 years. The "true" state wheat acreage value is obtained simply 


NUMBER OF SEGMENTS 


I 



PERCENT WHFAT \U A SEGMENT 


Figure 4-2.- Comparison of model -generated sample-segment wheat 
proportions with LACIE Phase II ground truth. 










Pi 'jure 4-4. LPP simulation of segment wheat proportion estimation errors. 



by adding up all the "true" county acreages (which are simulated by the LPP). 

The "true" state production value is obtained as follows: 

a. The "true" acreage for each CRD Is determined by sunning the "true" county 
acreages for each county in the CRD. 

b. The "true" acreage is multiplied by the "true" yield (Input to the LPP at 
the CRD level) to get "true" production for the CRD. 

c. A sum is performed over all the CRD's in the state. 

The LACIE Phase II result is also plotted in figure 4-5 except that It is 
expressed as the percent differences from the last USDA/SRS figures. 

The LACIL result is very close to the mode of the LPP values. A normal curve 
with the mean and standard deviation of the production data Is shown in fig- 
ure 4-6. The LACIE result Is also shown. The relative bias in the LPP pro- 
duction estimate is the value corresponding to the peak In this distribution, 
-8.7 percent. At the 5-percent level of significance, the value is not sig- 
nificantly different from the LACIE relative bias of -11.0 percent. However, 
-0.7 percent is significantly different from zero, which indicates that, if 
the assumptions made In the LPP are correct, the LACIE technology will, on 
the average, produce an underestimate of wheat production. This could be 
caused by (1) low segment proportion estimates, (2) iow yield estimates, or 
(3) a bias in the aggregation system. It has been shown (reference) that the 
aggregation system was not biased, and the fact that the actual LACIE aggregated 
acreage estimate has a larger (negative) relative difference than the actual 
LACIE production estimate indicates that low acreage estimates are the cause 
of the low production estimates. Finally, it should be noted that a relative 
bias of -8.7 is too large to satisfy the 90/90 criterion. 

4 . 5 OVERAL L VAR I A BILITY OF AREA AND PROD UCTION EST1 MATES 

Figure 4-7 shows (1) histograms of the estimated standard deviations cal- 
culated for each of the 15 iterations, (2) the standard deviations a of the 
area estimates produced by the 15 iterations, and (3) the estimated standard 
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Figure 4-7.- CV's of acreage estimates by state. 
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deviations of the LACIE estimate. 2 Each of these Is divided by the SRS 
acreage estimate. The number after the name of each state Is the number of 
segments "acquired" by the LPP for that state. Fo» most states there Is a 
rather broad distribution of the for the 15 years. The width of the dis- 

tribution is generally smaller when the number of segments acquired in a state 
Is larger. In particular, the states with the largest number of acquisitions, 
Kansas and North Dakota, have quite narrow distributions. 

There are two important observations to be made concerning these results: 

a. For every state except Texas and Colorado, is smaller than all of the 
o.. This is partly caused by the tendency of CAMS to overestimate the 
wheat in segments with low wheat proportions and underestimate the wheat 
in segments with high wheat proportions, which reduces the variance In 

the CAMS estimates. This phenomena is apparent in the significant differ- 
ence between the distributions of proportion estimation errors shown In 
figures 4-3 and 4-4. 

b. With the exception of Kansas, South Dakota, and Texas, o falls near the 
lower end of the distribution of the o^, as expected, because the formulas 
for calculating were desiqned to give a conservative estimate ( 1 . e . , 

on upper bound). This result is important since it implies that o L as 
calculated by the CAS is also likely to be an overestimate of *he true 
LACIE standard deviation. 

Figure 4-8 shows similar results for the standard deviations for production. 
These histograms are very similar to those in figure 4-7 and the same obser- 
vations apply. 


2 — 

is calculated by the LPP in the same manner that o L is calculated by the 

Crop Assessment Subsystem (CAS). 


Jtjl 
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Figure 4-8.-CV's of production estimates by state. 








5. CONCLUSIONS 


The LPP has been used to replicate LACIE Phase II for a 15-year period uslnq 
accuracy assessment results for Phase II error components. The results Indi- 
cate that the LPP simulates the LACIE Phase II procedures reasonably well. 

For the 15-year simulation, only 7 of the 15 production estimates were within 
10 percent of the true production. Further, the simulations indicate that 
the acreage estimator, based on CAMS Phase II procedures, has a negative bias. 
This bias is too large to support the 90/90 criterion with the CV observed 
and simulated for the Phase II production estimator. The results of this 
simulation study validate the theory that the acreage variance estimator in 
LACIE Is conservative. The simulated results also indicate that the estimated 
variance for the production estimator is conservative; that is, it tends to 
be larger than the true variance of the production estimator. Hence, more 
bias can be tolerated than i .dicated by the estimated CV. However, even with 
a reduction in the estimated CV to account for this overestimation, the bias 
Indicated by the simulations Is still too large to support the 90/90 accuracy 
goal . 
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